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Coverage games in small cells networks 

M. Le Treust, H. Tembine, S. Lasaulce, and M. Debbah 

Abstract: This paper considers the problem of cooperative power control in dis- 
tributed small cell wireless networks. We introduce a novel framework, based on 
repeated games, which models the interactions of the different transmit base stations 
in the downlink. By exploiting the specific structure of the game, we show that we 
can improve the system performance by selecting the Pareto optimal solution as well 
as reduce the price of stability. 
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1. Introduction 

In this paper we consider the problem of cooperative power control in distributed wire- 
less networks. In such networks, transmitters can decide their control policy freely. 
Because of multi-user interference the decision of the different transmitters are inter- 
dependent, which makes game theory a natural paradigm to study the problem of 
distributed power control. As the power control game take several times (e.g., because 
a transmitter sends several packets), repeated games are considered. In repeated games, 
selfish transmitters can have interest to cooperate. This framework is very relevant for 
the downlink of small cells networks where small base stations have not much infor- 
mation to implement cooperative power control policies. It is realistic to assume that 
small base stations are connected but only by low-capacity links. Therefore, repeated 
games where only low signalling between the players are a suited framework to address 
the problem under investigation. 

Repeated games with complete information (every player knows the set of players, 
all the action spaces, all the payoff functions) are known to have several outcomes. A 
well known result in that field is the so-called Folk Theorem [Tj [2] which characterizes 
the set of equilibrium payoffs if the players have perfect monitoring (every player is able 
to observe at each stage the actions chosen by all the other players), every feasible and 
individually rational payoff (at least the minmax point) can be obtained by an equilib- 
rium strategy of the repeated game. Whereas the knowledge of perfect monitoring can 
be acquired in certain scenarios where appropriate estimation and sensing mechanisms 
are implemented, our goal here is to show that some of these information assumptions 
can be relaxed by exploiting the specific structure of the networking game. This says 
in particular that one specific operating point, a specific Pareto optimal solution or a 
global optimum can be approximated by repeated equilibrium play under suitable as- 
sumptions. As a consequence, many results on inefficiency of equilibria in static games 
can be examined in repeated game setting and the performance can be improved. To 
this end, we consider a proximity graph of monitoring which allow us to relax the full 
observation assumptions on the other players. 

Our contribution is to provide explicit conditions over the coverage game such as 
implement an optimal action plan for the long-run game inspired from the work of 
Renault Tomala (1998 |3J) : any deviation of a player is followed by an identification 
procedure which will isolate the eventual deviator and an appropriate punishment plan. 
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One of the main consequence of this result is the possibility to select a specific operating 
point, namely a Pareto optimal solution (such as global optimum, Bargaining solution) 
which can improve the system performance, and hence reducing the price of stability 
(PoS) (the gap between the best equilibrium payoff and the optimal social welfare). 
Note that Nash equilibria, Wardrop equilibria, Stackelberg solutions can be suboptimal 
and inefficient in generic static games. Under proximity graph of strategic observation, 
the repeated game approach can lead to the global optimum if the initial plan is this 
operating point (see Theorem [6]). In contrast to most of learning algorithms that try to 
reach "equilibrium" of the static game, here we examine equilibria of the long-run game 
which can be global optimum for the static game. In particular the price of stability is 
one. 

The rest of the paper is structured as follows. In Section O, we present the network 
model. In Section [3^ we analyze the one-shot game between small base stations (SBSs). 
In Section H]J we study the repeated coverage game. 

2. Network model 

We consider S small base stations (SBS) and a large number of mobile stations. The SBS 
index is denoted by i G {1, S}. They SBS are assumed to exploit the same frequency 
band. Around each SBS, mobile stations are assumed to be distributed geographically in 
a plane according to a density Xi(x, y) where (x, y) G M 2 correspond to the coordinates 
of a mobile station located at the position (x,y). The density Xi(x,y) = outside a 
range of a certain radius Ri from SBS i. We use the polar coordinate representation: 
the radius from the origin is r = (x 2 + t/ 2 ) 5 , the angle 6 G [0, 27r[ is determined by 
cos 9 = sin# = | for r ^ 0,. The relevant channel model to describe this situation is 
the interference channel. 

In the scenario under investigation, the interference channel comprises S trans- 
mitters and a large number of receivers. Let h,...,ls be the location of the SBSs, 
k = (hi, hi, hz) G R 3 with l i3 ^ 0. We assume that the h ^ lj, V i ^ j. We denote by 
d: M 3 x R 3 — ► JR + the Euclidean distance in R 3 . 

d((x, y, z), (x', y', z')) := ((x - x'f + (y - y'f + (z - z'f) * . 

Let di(x,y) := d(k, (x,y,0)) be the Euclidian distance from SBS i to a MS located 
at the position (x,y). 

d € (x, y) := d(k, (x, y, 0)) = ((x - l a ) 2 + (y - k 3 ) 2 + 4) h > \hs\ > 0. 

The transmit power of SBS i is denoted by p^. If an MS is located at (x, y) the downlink 
SINR associated with SBS % is given by 

SINR,(x, y) = 9u(x,y)Pi (1) 

where ga(x,y) = au(x,y)di(x,y)~ li represents the channel gain (path loss) of the link 
between SBS % and an MS located in the point (x,y), gji(x,y) = aji(x,y)dj(x,y)~ lj 
represents the channel gain (path loss) of the link between SBS j and an MS located 
in the point (x,y), jj is a real number jj > 2. 
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Remark 

The mapping (x,y) i — > SINRj(x, y) has a finite limit at (0,0). We can choose the 
function au(x,y) into the form bndi(x, y)~ 7i l{di(:r,2/)<Ri} (No interference from z outside 
a range of a maximum radius Rj) Similarly, 

aji(x,y) = bjidjtx^)" 71 !^^^^}, 

bu and bp are positive constants. 

3. One-shot coverage game definition 



The one-shot coverage game is defined by the triplet 

Q = (S, {Vi} ieS , {ui} ie s) ■ 
The set S = {1, S} is the set of players who are the SBS, the set Vi 



(2) 



is 



the action space of SBS i, and the utility function of SBS i is defined as follows: 

/ \ 



Ui(pi,P2, -,Ps) 




Xi(x,y) log 



1 



V 



° 2 + ^9ji{x,y)p 3 



dx dy (3) 



Assume that gu(x,y) has the above form. Then, Wj(pi,p 2 , ■■■,Ps) can be rewritten as 

/ \ 

gu(r cos 9,r sin 9)p { 



R, 



2tt 



Aj(r cos 6, r sin 9) log 2 



V 



cr 2 + gji(r cos 9, r sin 9)pj 



rdr d9 (4) 



3.1 Existence and uniqueness of the one-shot Nash equilibrium 

The players are rational and the above description is common knowledge (every player 
is rational know Q and every player know that every player is rational and know Q and 
so on). An important game solution concept is the Nash equilibrium, i.e., a point from 
which no player has interest in unilaterally deviating. An action profile P is a pure 
Nash equilibrium of Q if 



Vi E S, V p[ E Vi, Ui{px, . . . ,Pi-i,Pi,Pi+i, ■ ■ ■ ,ps) > tti(pi,...,Pi_i,j^,p. 



i+ij 



,Ps) 



(5) 



We say that a strategy for a player is dominated if there exists another strategy 
for her which is better for her no matter what choices the opponents make. Since 
such strategies are suboptimal, the player would eliminate dominated strategies. This 
dominance analysis reduces the set of possible outcome of the game. In some games, 
a dominance analysis leads to a unique prediction of the outcome when players are 
rational. We say that these games are said dominance solvable. 

The next lemma shows that our one-shot coverage game is a dominance solvable 
game. 

Lemma 1. The one-shot game is a dominance solvable game. 

As a corollary, the game Q has a unique Nash equilibrium given by P NE = (p™ ax , . . . , p™ ax ). 
The minmax level is determined by P NE . Denote uf E = Ui (P NE ) the utility of SBS i 
at the Nash equilibrium. 
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3.2 Inefficiency ofNE 

If there is no interfering ranges between the SBSs then NE is a global optimum (max- 
imum sum utility). For interfering ranges of base stations, the Nash equilibrium is 
clearly inefficient compared to the global optimum of of the long-run interaction. For 
example, for the long-term interaction, a time-sharing solution would be better than 
the NE. Next, we define a specific Pareto optimal solution, namely a bargaining solu- 
tion (Nash bargaining, Rubinstein bargaining, Kalai-Smorodinsky bargaining etc). The 
Kalai-Smorodinsky (denoted KS) bargaining jU |3] is one the bargaining model which 
has a very nice geometric interpretation of the problem of "splitting a cake" . The KS 
bargaining is a fair solution in the sense that the minimum utility is maximized. More- 
over, in the long-run game, the KS bargaining solution achieves the convex hull of the 
the feasible utilities and strictly dominate the one-shot Nash equilibrium. 

By definition, the KS bargaining solution corresponds to the maximum point of the 
region of convex hull of feasible utilities that is located on the line joining a disagreement 
point (for example the NE u NE = (u^)^) and the maximum utility profile denoted 
u = (itZj)ies where Ui = ma.x p Ui(p). Denote p KS = (pf )n=s the power vector that 
achieves the KS bargaining solution and u KS = (uf )j e s the corresponding utility 
profile. In Figure [U we consider a simple example for which the KS bargaining solution 
consist in an fair time sharing. In next section, we develop a repeated approach leading 
to the KS bargaining solution. 

4. Repeated coverage game 

Assume the SBS are synchronized (say at the frame level) and update their transmit 
power every frame. We consider a discounted repeated game played over a large num- 
ber of frames. Taking into account the past behavior of the player allow to construct 
Pareto-optimal equilibrium strategies. The assumption that the players observe the 
actions of all the others players, at the end of each stages seems unrealistic in our net- 
work game. We suppose here the players are able to sense their environments to detect 
the power transmission level of their neighbor. We model this situation using a graph 
of strategic observation, saying that a SBS i observes the action power of each SBS 
neighbor j G G(i) in the graph G. Strategic signaling is an essential assumption to 
guarantee a robust equilibrium condition on the proposed strategic action plan. 

The repeated game is denoted T = (S, (7i)igs, (#i)ies, (si)i£s), where S is the set 
of players, (Ti)ies is the set of strategies, (i?i)i e s is the vector of the long-term payoffs 
and Si is the strategic observation function of player i : s$ : V — V Si defined by the 
observation graph G. If action p G V was played in the last stage, player % will received a 
strategic signal s*(p) which disclose the information about his neighbor's actions. From 
now on, we assume that, for a given transmission block or game stage t > 1, each device 
% knows and take into account the actions that have been played by his neighbors G(i) 
in the past, before choosing his stage action Pi(t). We denote by p l (t) the sequence of 
actions played before time t and observed by player i: p l (t) = (pi(t), (pj(t — l))j e c(i))- 
The vector K t = ■■■,p l {t — 1)) is called the private history of player i at time t 



Definition 1 (Players' strategies in the RG). A pure strategy for player i G S is a sequence 



and lies in the set H\ = (<S>jeG{i) [0,P™ ax ] 
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of functions (fa )t ) t>1 with 



(6) 



The strategy of player i will therefore be denoted by T{ while the vector of strategies 
t— fai, r 5 ) will be referred to a joint strategy. A joint strategy r induce in a natural 
way a unique plan of action (p(t))t>i- To each profile of powers p{t) corresponds a 
certain instantaneous payoff for player z. In our setup, each player does not 

care about what he gets on a given block but what he gets over the whole duration 
of the game. This is why we consider a payoff function resulting from averaging over 
the instantaneous payoff. In order to quantify the fact that the transmitters can value 
short-term and long-term gains differently we use the model of infinitely repeated games 
with discounting [1J. The averaged payoff for player % can then be defined as follows. 

Definition 2 (Players' payoffs in the RG). Let r = fai, ts) be a joint strategy. The payoff 
for player % e S is defined by: 



The parameter < A < 1 is the discount factor which is seen as the game stopping 
probability [2]: the probability that the game stops at stage t is thus A(l — A)* -1 . This 
shows that the discount factor is also useful to study wireless games where a player 
enters/leaves the game. 

At this point, equilibrium strategies in the repeated game can be defined. 

Definition 3 (Equilibrium strategies in the RG). A joint strategy r supports an equilibrium 
of the repeated game defined by T = (S, (7i) i& s, (i?i)*e5, (si)ies) if 



4.1 Strategic signal on a Proximity Graph of Base Station 

The strategic observation structure is a fundamental in order to define an optimal action 
plan and guarantee a robust equilibrium condition at each time of the game duration. In 
order to guarantee the cooperation on an optimal operating point (as the KS bargaining 
solution), the players SBS should be able to detect any deviating behavior, identify the 
SBS which deviate and start a punishment mechanism. Inspired from [3], we first 
propose a repeated game strategy leading to the KS bargaining solution. Second we 
show that this strategy also satisfy a robust equilibrium condition on for each time of 
the game duration. 

Definition 4. We propose a strategy that lead to a Pareto- optimal utility vector defined by the 
KS bargaining, such that non deviation from this cooperative plan will be profitable. 

• The cooperative plan take place from the first stage of the game and consist in playing 
the KS bargaining power profile p ^3.2\ as long as no deviation is detected. 

• If a player deviate, then begin the procedure of the sets of suspects. 




(7) 



where pit) is the power profile of the action plan induced by the joint strategy r. 



Vl G «S,Vt/, Vi(T) > Vi(n, ...jTi-ijT^Ti+i, ...,T S ) 



(8) 
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• After the end of the procedure, start an adequate punishment plan targeted at the devi- 
ator. 



Procedure of the set of suspects 

Using the results of Renault Tomala (1998 [3]), we define the procedure of the set 
of suspects which spread the identity of the deviator among all the players in a finite 
number of stages. To implement this strategy, the graph of strategic signalling should 
satisfy the following 2-connectivity property 

Definition 5. A graph G is 2-connected if G is connected and for each i £ S, the graph G % 
(when player % has been removed) is connected. 

We give an explicit description of the procedure of the set of suspect. We refer 
to [3J for detailed proofs and intuitive examples. First divide the stages into block of 
length I = log(2 ). Each subset iV C S of player is mapped into a sequence of actions 
p(l), ■ --,p(l) which is publicly known. The sequences of power level are used to encode 
every subset of players in order to communicate between the players. If during some 
block m, the neighbor j of player i do not follow the main plan. Then, player i elabo- 
rate a set of suspect including player j and other possible deviating neighbors. At the 
beginning of block m + 1, he play the sequence of moves corresponding to his set of 
suspect. Now, from block to block, the players confront their set of suspect with their 
neighbors, adding the new suspects and excluding innocents players. The 2-connectivity 
property is essential to innocent all the players except the deviator. The date at which 
a player k enter in the set of suspect of player i and of player j is predictable, knowing 
the structure of the graph. If player k did not deviate, then another player, say i, 
will remark an incoherence between the date of the set of suspect including k and the 
length of the shortest paths from k to i. Then player i exonerate player k so as his 
neighbor in the next block. The identity of the deviator become common knowledge 
after n = I ■ max(l, 21 — 5) stages. 

Punishment Plan Once the procedure of the set of the suspects finish, every SBS 
know the identity of the deviating SBS i £ S. The punishment plan of the strategy 
consist for the neighbor G(i) of player i in playing a one-shot Nash equilibrium p NE 
until the end of the game whereas the other players continue to play the optimal KS 
bargaining p KS action power (see Fig. 1). 

Such a strategy is proved to lead to an optimal operating point such that no deviation 
from the above algorithm could be profitable. 

Theorem 6.IfG is 2-connected, the discount factor satisfy the following condition 



Then the above strategy © is an equilibrium strategy. 

Note that any other feasible utility vector characterized by the Folk theorem could 
be considered here. 




(9) 
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Proof. Suppose a player i deviates, thus the procedure of the suspect spread his identity 
among the players and the punishment phase push down his payoff under uf E until the end 
of the game. Compare the total deviation payoff : 



a(i - a)- 1 * + £ a(i - xy-^r < E A ( x - x y~ lu ? s + E A ( x - A ) s_1 

s=l s>n+l 

(l_(l_ A )fi) < (1-A)>f 



s=l s>ti+1 s=l s>n+l 

\n/,KS 



X < 1 



JU j- uf s 
Ui - U? E 



The recursive structure of the discounted utility function implies that this inequality does not 
depends on a particular stage. This condition, over the stop probability A insures that the 
equilibrium condition is valid for every player in every stage of the game, thus the proposed 
strategy is an equilibrium strategy of the repeated game. 

□ 



5. Numerical Results 

Our theoretical result are illustrated with a valid simple model of two SBS. In figure Q] 
we derive the achievable utility region for our two base stations considering the utilities 
defined by equations [3] where the densities \(x,y) are supposed to be uniform and 
the channel gains ga(x,y) are constant over the range of SBS i. This figure depicts 
the utility region and it's convex hull with the Nash equilibrium utility and the KS 
bargaining utility. The repeated game approach improved the utility of both SBS. 

6. Concluding remark 

We have studied a proximity-based network coverage games and have shown that the 
KS bargaining utility can be achieved for our model of discounted repeated game. 
This result is general and could be applied in a larger class of communication game 
over a fixed network. It would be nice to investigate an application of the suspect's 
procedure to classical web-protocols. From the game theoretical point of view, it would 
be interesting to extend this result to sub-game perfect equilibrium or to relax the 
information hypothesis or to introduce a stochastic graph. 
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Figure 1 : Utility region of coverage repeated game. 
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